Finding Originally Mislabels with MD-ELM
نویسندگان
چکیده
This paper presents a methodology which aims at detecting mislabeled samples, with a practical example in the field of bankruptcy prediction. Mislabeled samples are found in many classification problems and can bias the training of the desired classifier. This paper proposes a new method based on Extreme Learning Machine (ELM) which allows for identification of the most probable mislabeled samples. Two datasets are used in order to validate and test the proposed methodology: a toy example (XOR problem) and a real dataset from corporate finance (bankruptcy prediction).
منابع مشابه
TROP-ELM: A double-regularized ELM using LARS and Tikhonov regularization
In this paper an improvement of the optimally pruned extreme learning machine (OP-ELM) in the form of a L2 regularization penalty applied within the OP-ELM is proposed. The OP-ELM originally proposes a wrapper methodology around the extreme learning machine (ELM) meant to reduce the sensitivity of the ELM to irrelevant variables and obtain more parsimonious models thanks to neuron pruning. The ...
متن کاملDissimilarity based ensemble of extreme learning machine for gene expression data classification
Extreme Learning Machine (ELM) has salient features such as fast learning speed and excellent generalization performance. However, a single extreme learning machine is unstable in data classification. To overcome this drawback, more and more researchers consider using ensemble of ELMs. This paper proposes a method integrating voting-based extreme learning machines (V-ELM) with dissimilarity (D-...
متن کاملClassification of Hippocampal Region using Extreme Learning Machine
Important brain parts like hippocampal usually being manually segmented by doctors. But with the introduction of hybrid between machine learning along with neuroimaging technique, it has proved to shows some promising results regarding on segmenting subcortical structures. However, it is known that Extreme Learning Machine (ELM) is to be superior machine learning technique. This study will inve...
متن کاملPhylogeny-aware identification and correction of taxonomically mislabeled sequences
Molecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases c...
متن کاملFinding an Appropriate Cut-off Point for Neck Circumference to Determine Overweight and Obesity in a Large Sample of Iranian Adults
Objective: Obesity is a major public health concern and there are different ways to detect it in population. The aim of the present study is to evaluate the neck circumference (NC) in a simple and practical way. Materials and Methods: This cross-sectional survey utilized data from the Yazd Health Study (YaHS) which is a population-based cohort study. In brief, 9962 individuals aged 20-70 years...
متن کامل